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Abstract 

Synthesis of reversible logic has received significant attention in the 
recent years and many synthesis approaches for reversible circuits have 
been proposed so far. In this paper, a library-based synthesis method- 
ology for reversible circuits is proposed where a reversible specification 
is considered as a permutation comprising a set of cycles. To this end, 
a pre-synthcsis optimization step is introduced to construct a reversible 
specification from an irreversible function. In addition, a cycle-based rep- 
resentation model is presented to be used as an intermediate format in the 
proposed synthesis methodology. The selected intermediate format serves 
as a focal point for all potential representation models. 

In order to synthesize a given function, a library containing seven 
building blocks is used where each building block is a cycle of length less 
than 6. To synthesize large cycles, wc also propose a decomposition algo- 
rithm which produces all possible minimal and inequivalent factorizations 
for a given cycle of length greater than 5. All decompositions contain 
the maximum number of disjoint cycles. The generated decompositions 
are used in conjunction with a novel cycle assignment algorithm which is 
proposed based on the graph matching problem to select the best possible 
cycle pairs. Then, each pair is synthesized by using the available compo- 
nents of the library. The decomposition algorithm together with the cycle 
assignment method arc considered as a binding method which selects a 
building block from the library for each cycle. Finally, a post-synthesis 
optimization step is introduced to optimize the synthesis results in terms 
of different costs. 

To analyze the proposed methodology, various experiments are per- 
formed. Our analyses on the available reversible benchmark functions re- 
veal that the proposed library-based synthesis methodology can produce 
low-cost circuits in some cases compared with the current approaches. 
The proposed methodology always converges and it typically synthesizes 
a give function fast. No garbage line is used for even permutations. 
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1 Introduction 



An n-input, n-output, fully specified Boolean function is reversible if it maps 
each input pattern to a unique output pattern. A gate is called reversible if 
it realizes a reversible function. In 1961, Landauer proved that using conven- 
tional irreversible logic gates leads to a certain amount of energy dissipation per 
irreversible bit operation regardless of the underlying technology |1 . In 1973, 
Bennett stated that to avoid power dissipation in a circuit, it must be built from 
reversible gates [2j. 

Energy consumption has become one of the most challenging problems in 
digital circuit design. To reduce power dissipation in CMOS circuits, numerous 
approaches have been proposed in the recent years which improve the non-ideal 
behavior of transistors and materials fs] . However, such methods cannot provide 
zero energy dissipation if irreversible bit operation is permitted . 

While heat generation due to the information loss in modern CMOS circuits 
seems to be small compared with the other parts of power dissipation, it has 
been shown that power dissipation resulted from information loss is at least 
0.147 W for a fully loaded Intel Itanium-2 processor [ij. In addition, heat 
removal will be more difficult with the increasing density of CMOS integrated 
circuits i5j . Currently, reversible computing has received considerable attention 
in particular in low-power CMOS design [6]. 

Besides the power consumption problem of CMOS digital circuits, the un- 
ceasing miniaturization of integrated circuits is widely expected to end within 
the coming years . This problem leads researchers to investigate new compu- 
tational paradigms. Among them, quantum computing seems to be the most 
promising approach [sj. Quantum gates are inherently reversible [9]. Thus, 
reversible logic has also found great interest in the domain of quantum com- 
putation. As such, various Boolean reversible gates are used in different quan- 
tum algorithms ^10) . While the advantages of quantum computing are not to- 
tally available without pure quantum gates, constructing efficient circuits with 
Boolean reversible gates is considered an important step towards realization of 
quantum systems [sj, [II] . 

Boolean reversible circuit synthesis is defined as the ability to generate a 
reversible circuit from a given Boolean reversible specification. Synthesis of 
reversible logic differs from that of irreversible circuits because of various con- 
straints imposed by the reversibility. For examples, loop and fanout are not 
allowed in reversible logic. Therefore, available irreversible synthesis approaches 
cannot be applied to synthesize reversible circuits as well. To address this need, 
several synthesis algorithms for reversible functions have been proposed where 
both exact 12 I^ and heuristic approaches [8 14 ■ 16 have been applied. 

Exact synthesis algorithms use methods such as Boolean satisfiability (SAT) 



13 or symbolic reachability analysis 12] to obtain optimal circuits for reversible 



specifications. More precisely, exact approaches define a set of equations to 
model the synthesis stage as a well-defined problem (e.g., SAT) first. Then, 
available solvers are applied to find at least one solution (i.e., a synthesized 
circuit) for the given specification. However, due to the exponential search 
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space growthj^such approaches are useful to obtain optimal circuits for small 
specifications and they cannot be used to handle relatively large functions. 

On the other hand, several heuristic methods have been proposed to find 
an efficient circuit for a given specification where the term 'efhciency' can be 
defined according to various metrics [17| . Among the available metrics, 'quan- 
tum cost' is widely accepted to be used in the synthesis stage. However, based 
on the selected target technologjj^ the consideration of one specific metric may 
be more important than the others. For example, while the number of garbage 
lines can be ignored for Boolean CMOS reversible circuits, it is very important 
for quantum and Boolean reversible circuits used in quantum logic. Hence, ap- 
proaches that use an arbitrary number of garbage lines (e.g., [l5]) cannot be 
applied to quantum logic. 



In 19 , an NCT-based synthesis algorithm has been proposed that consid- 



ers reversible functions as a set of cycles where each cycle was implemented 



by several reversible gates. By extending the results of 19 , this paper pro- 
poses a library-based synthesis methodology for reversible circuits which uses 
the NCT gate library where binding and optimization methods along with a set 
of building blocks are introduced to be used in a unified library-based synthesis 
methodology. The rest of the paper is organized as follows. Basic concepts are 
introduced in Section [2j The synthesis algorithm of ^19j is described in Section 
[3j The proposed library-based synthesis methodology is introduced in Section|4j 
Experimental results are presented in Section [5] and finally, Section [6] concludes 
the paper. 



2 Basic Concepts 
2.1 Reversible Logic 

Let A be any set and define / : A — >■ ^ as a one-to-one and onto transition 
function. The function / is called a permutation function, as applying f to A 
leads to a set with the same elements of A and probably in a different order. If 
A = {1, 2,3,..., to}, there exist two elements and aj belonging to A such that 
/(oi) — flj. A k-cycle with length k is denoted as (oi, 02, . . . , a^) which means 
that /(ai) = 02, /(a2) = 03, and f{ak) = ai. A given fc-cycle (01,02, ■ ■ ■ ,ak) 
could be written in many different ways such as (02, 03, . . . , a^, ai). A cycle of 
length 2 is called transposition. 

Cycles ci and C2 are called disjoint if they have no common members, i.e., 
Voi S Ci,ai ^ C2. Any permutation can be written uniquely, except for the 
order, as a product of disjoint cycles. The unique cycle form of a permutation 
is called canonical cycle form (CCF) [19| . If two cycles ci and C2 are disjoint, 
they can commute, i.e., C1C2 — C2C1. In addition, a cycle may be written in 

^ Exact modelings are done based on the characterizations of the input specification such 
as the number of input lines and the number of required gates. 

■^Several different quantum computing technologies with different strengths and challenges 
have been developed so far. Examples are ion traps, quantum dots, linear optic and NMR. 
See |18| for different quantum technologies. 
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different ways as a product of transpositions, and using different numbers of 
transpositions. For example, the 3-cycle (1,2,4) can be written as a product of 
two transpositions as (1,2)(1,4). 

A cycle (or a permutation) is called even if it can be written as an even 
number of transpositions. A similar definition is introduced for an odd cycle. 
Although there may be too many ways to decompose a given cycle into a set 
of transpositions, the parity of the number of transpositions used remains the 
same, i.e., all resulted decompositions have the same even/odd number of trans- 
positions. It can be verified that for a given even (odd) value of fc, the resulted 
/c-cycle can be written as an odd (even) number of transpositions. Hence, a k- 
cycle is odd (even) if k is even (odd). Each reversible function can be considered 
as a permutation function. 

A generalized Toffoli gate C^'NOT (a;i, X2, ■ ■ •, Xm+i) passes the first m lines 
unchanged. These lines are referred to control lines. This gate flips the (m + l)*'' 
line (i.e., target) if and only if the control lines are all one. Therefore, the 
generalized Toffoli gate works as follows: Xi(put) = Xi(i < m -\- 1)^ Xm+i{out) = 
X1X2 ■ ■ ■ Xm ® Xm+i- For m = and m = 1, the gates are called NOT and 
CNOT, respectively. For m 2, the gate is called C^NOT or Toffoli. 

In addition to the C™NOT gate, several other gates have been proposed 
previously 19]. Among them, controlled-^ (controlled-y+) changes the value 
on its target line using the transformation given by the matrix V (V^) if the 
control line has the value of 1. 



V = 



1 



1 



1-i 



1 i 
i 1 



(1) 



To physically realize a synthesized circuit, all complex gate should be de- 
composed into a set of primitive gates. It has been shown that all one-qubit 
gates and a standard two-qubit gate, usually CNOT, can be used for such de- 
composition ^ 20 . In 21 all two-qubit quantum gates were used during the 
decomposition. The gates NOT, CNOT, controlled-^, and controlled-y+ have 
been efficiently simulated in some quantum computer technologies 22 . These 
gates were studied in the literature and are considered as elementary gates 



for reversible Boolean functions llOl, 23 . We used the same set of elemen 



tary gates throughout the paper. The number of elementary gates required for 
simulating a given gate is called quantum cost. Inputs (outputs) that are not 
required in the specification of a reversible function are called constant (garbage 
or auxiliary) bits. 

Positive polarity Reed-Muller (PPRM) expansion can also be used to de- 
scribe a reversible specification. PPRM expansion uses only un-complemented 
(or positive) variables and it can be derived from the EXOR-Sum-of-Products 
{ESOP) description by replacing a' with a © 1 for a complemented variable a. 
In addition, some algebraic manipulation of product terms may also be done to 
simplify the equations. The PPRM expansion of a function is canonical and is 
defined as: 
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Figure 1: (a) A sample reversible circuit, (b) input specification in the truth ta- 
ble notation without constant and garbage lines and with constant and garbage 
lines (c), in CCF notation (d), and in PPRM notation (e) 



f{xi,xi, Xn) = ao ffi aixi © ■ • • © a„x„ ® 012X1X2 ® ■ ■ ■ 

Qjan,n^lXn~lXn® ■ ■ ■ ® ai2...nXlX2 ■ ■ ■ Xn 

A sample reversible circuit which includes one constant line with the initial 
value 1 and two garbage lines (i.e., shown by symbol g) is depicted in Figure [l] 
The input specification in different notations are also illustrated in this figure. 

It has been shown that for n > 5 and m € {3,4, ••• [n/2]}, a C"'NOT 
gate can be simulated by 12m-22 elementary gates. In addition, for n > 7, a 
C"~^NOT gate can be simulated by 24n-88 elementary gates with no auxiliary 



bits 24 . On the other hand, a C"~^NOT gate can be simulated with an expo- 
nential cost 2"'-3 if no garbage line is available [lOj . To avoid the exponential size 
and the need for a large number of elementary gates, several researchers used an 
extra garbage line for an efficient simulation of C"~^NOT gate [s]. Generally, 
the number of available bits is very restricted in today's reversible and quantum 



implementations 25 . Therefore, for two circuits with equal linear costs, the 



one without garbage line is preferred. 
2.2 Cycle Factorization 

A reversible specification can be considered as a permutation function which 
includes a set of cycles of various lengths. On the other hand, a given cycle of 
length greater than two can be factorized into several cycles of smaller lengths. 
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Let (Ti, • • • , <7rn be a factorization of the cycle (ai, 02,. • ■ , On) into a product of 
smaller cycles. We say the factorization is of type a = (02, . . . ,ak) if among 
< j < fn) there are exactly 02 2-cycles, as 3-cycles and so on. Let us 
define: 



(a) = Sj>2(j - l)aj (3) 

where a satisfies (a) > n—1. For the case of equality, the factorization is called 
minimal. Two cycle factorizations are called equivalent if one can be obtained 
from the other by repeatedly exchanging adjacent factors that are disjoint. 

Example 1 Consider a given cycle ■K — (a, b, c, d, e) of length n = 5. It can be 
verified that n can be factorized into (a, b) (a, c) (a, d, e) with the cycle type 
(2, 1). Note that cycles are applied from left to right. For this factorization, we 
have (a) =1x2 + 2x1=4. Since (a) = n — I this factorization is minimal. 



Cycle factorization has a rich history in combinatorial problems 26 -28 . In 
particular, a significant effort has been directed to count the number of fc-cycle 
factorizations. The case fc = 2 (transposition factors) is known as the Hurwitz 
problem \27]. The following formula gives the number of 2-cycle factorizations 
of any permutation of cycle type (ai, . . . , am)' 

CCi + l 

„(™-3)(„ + „^_2)!n™i^^^ (4) 
It has been proved that the number of inequivalent 2-cycle factorizations of 



the cycle (1, 2, ... n) is the generalized Catalan number 28 

^,C-l') (5) 

The following theorems examine the number of cycle factorizations for gen- 
eral cases. In this paper, cycle factorization is used to extract library elements 
from a given reversible specification as discussed in Section |4] in detail. 

Theorem 1 Let i — (12,^3, . ■ .) be a sequence of nonnegative integers and set 
r — r{i) — 12 -|- ^3 -|- . . .. Then, the number of cycle factorizations of (1,2,..., n) 
with cycle index i is 

n^ ^ r ^ 
ttfe>2«fe! 

in the case that n + r — 1 = 'Ek>2kik, md zero otherwise. 



Theorem 2 (from \2^) Let i — (12, 13, . . .) be a sequence of nonnegative inte- 
gers, not all zero, r = r{i) = «2 + *3 + • • ■• Then, the number of inequivalent 
cycle factorizations of {1,2, ... ,n) with cycle index i is 

~ (2n + r-2)! 

if n + r — 1 ~ Tik>2kik, o-nd zero otherwise. 
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2.3 Graph Matching 



In order to select library elements in the proposed library-based synthesis method- 
ology (Section |4.4[ ), an available graph perfect matching algorithm is applied. 
Given a graph G = {V,E), a. matching M in G is a set of pairwise non-adjacent 
edges; that is, no two edges share a common vertex. A vertex is matched if it 
is incident to an edge in the matching. Otherwise the vertex is unmatched. A 
maximum matching is a matching that contains the largest possible number of 
edges. There may be many maximum matchings. 

A perfect matching is a matching which matches all vertices of the graph. 
That is, every vertex of the graph is incident to exactly one edge of the matching. 
In a weighted bipartite graph, each edge has an associated value. A minimum 
weighted bipartite matching is defined as a perfect matching where the sum of 
the values of the edges in the matching has a minimal value. If the graph is not 
complete bipartite, missing edges are inserted with value zero. 



3 Previous Work 



Several authors discussed the requirements of a design methodology for re- 



versible and quantum circuits. In 30 , a computer-aided design flow for quantum 



computation was presented that transforms a high-level language program into 
a technology-specific implementation. In addition, the languages and transfor- 
mations needed to represent and optimize a quantum algorithm in the proposed 
design flow were discussed. The authors of 31 introduced an HDL-based sim- 
ulation methodology for quantum circuits where the HDL feature of describing 
a circuit with both structural and functional architectures was employed in the 
proposed methodology. In 32 , the authors proposed an instruction set archi- 
tecture and several tools such as compiler, device scheduler and simulator for 
ion trap based quantum computers. A computer-aided design flow for quan- 



tum circuits was proposed in 33 which includes automatic layout and control 
logic extraction. In addition, several heuristics for the placement and routing of 
quantum circuits in ion trap technology were presented in [33[ . In the following 
paragraphs, those papers published for the synthesis of reversible circuits are 
discussed. 

The synthesis of reversible circuits composed of generalized Toffoli gates has 
been studied extensively [8 p4}{l6p4] . Since the cost of a generalized Toffoli gate 
in terms of the physical implementation is high, to realize a complex generalized 



Toffoli gate it should be decomposed into some elementary gates 24 . Although 



this approach was adopted more in the previous years, a direct synthesis method 
that uses simple elementary gates could behave more efficiently. To this end, 
a few papers [TTl[T9l[35] were published in recent years which used NOT gate 



library containing simple low-cost NOT (N), CNOT (C) and Toffoh (T) gates. 
The authors of 11 proposed an NCT-based synthesis method which applies 



N, T, C and T gates in order (i.e., the T|C|T|N method) to synthesize a given 
permutation. In the first C|T|N part, the terms and 2' of a given reversible 
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Figure 2: The 7r2 circuit for the (2,2) synthesis algorithm 19 



function are positioned at their right locations while the last Toffoli network 
places the other truth table terms in their right positions. 

In flT, for the last Toff oh part, a given fc-cycle is decomposed into a set 
of transpositions. Subsequently, each pair of disjoint transpositions (a, b) (c, 
d), is implemented by a circuit (i.e., the tt circuit) that maps a, b, c and d to 
2" — 4, 2" — 3, 2" — 2 and 2" — 1, respectively where n is the number of bit 
in the function specification. Then, the permutation (2" — 4, 2" — 3) (2" — 2, 
2" — 1) is implemented by a circuit called kq- Finally, the reverse tt circuit, 
i.e., 7r-\ is apphed to transform 2" - 4, 2" - 3, 2" - 2 and 2" - 1 into a, 6, 
c and c?, respectively. It can be verified that the ttkott"'^ circuit implements 



the permutation (a, b) (c, d). An extension of 11 was suggested in 35 which 
produces better quantum cost by applying the unit-cost NOT and CNOT gates 
instead of using Toffoli gates with cost 5 in many situations. 



In our previous work 19 , a cycle-based synthesis algorithm was proposed 



based on the results of 35 where cycles of lengths less than 4 are synthesized 



directly. More exactly, in 19 a set of synthesis algorithms were proposed to 
synthesize a pair of 2-cycles, a single 3-cycle, and a pair of 3-cycles. Each cycle is 
called a building block or an elementary cycle. In order to improve the synthesis 
cost, the authors extended the building blocks to include a single 4-cycle followed 
by a single 4-cycle or a single 2-cycle, a single 5-cycle and a pair of 5-cycles. In 
addition, we used NOT and CNOT gates instead of Toffoli in many situations. 

Example 2 Assume that the pair of 2-cycles (5,3) (9,67) should be imple- 
mented. To this end, the term 5 is transformed to 4 by a CNOT gate (gate 
#1 in Fig. ^ which has no effect on other terms. Similarly, 3 is transformed 
to 1 by a CNOT gate (gate #2 in Fig. |^ which changes the term 9 to 11 and 
67 to 65. Then, 11 is transformed to 2 by two CNOT gates (gate #3, gate #4 
in Fig. with no effect on other terms. Finally, 65 is transformed to 67 by 
a CNOT gate (gate #5 in Fig. Then, a pre-designed circuit, such as the 
one shown in Figure^ ("^2), is applied followed by the circuit shown in Figure 
(kq). Afterwards, the gates applied before the kq circuit are applied in the 
reverse order. Fig. illustrates the complete circuit. 
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Figure 3: The kq circuit 19 
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Figure 4: A sample circuit synthesized by the method of 19 



On the other hand, to synthesize a given large cycle of length fc (fc > 3) 
the authors used one possible decomposition to extract the suggested building 
blocks (i.e., cycles) from the input specification (i.e., permutation) that leads 
to a set of cycles of lengths 3 and probably a cycle of length less than 3. Since 
we used an extended set of building blocks here, the decomposition algorithm 
was modified to detach 5-cycles. Therefore, the results of the decomposition 
algorithm is a set of cycles of lengths 5 and probably a cycle of length less than 
5. As the synthesis of a cycle pair is more efficient than the synthesis of two 



single cycles by using the method of 19 , cycle pairs are explored during the 



synthesis as discussed in the following sections in details. 

Example 3 Consider a given permutation n—(3, 5, 6, 1, 9, 10, 11, 12, 13, 
14, 15, 17, 18, 19, 20, 21) (22, 23, 24, 25, 26, 27) (28, 29) (30, 31). It can 
he verified that applying the decomposition algorithm of Jl9j for detaching all 
5-cycles leads to tt=(3, 5, 6, 7, 9) (10, 11, 12, 13, 14) (15, 17, 18, 19, 20) (22, 
23, 24, 25, 26) (21, 3, 10, 15) (22, 27) (28, 29)(30, 31). 

Let (iri,r2,...,rfc (^7 ^) bc thc number of permutations with exactly k cycles of 
length ri,r2, . . . , for a set of n distinct numbers. The falling factorial {n)k 
is defined as n{n — l)(n — 2) . . . (n — A; + 1). The size of each building block 
can be determined as d2,2{n,'2) = (71)4, ^3(^,1) — {n)^, ^3, 3(71, 2) — {n)^, 
d4^2{n,2) — (n)6, d^^n,!) ~ (n)^, c?5, 5(7^,2) = (")io. To prove, consider a pair 
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of two cycles (a, 6)(c, d). For the first element a, all n elements can be selected. 
For the next element b, n — 1 elements can be selected and so on. 
In the next section, we propose a library-based synthesis methodology for re- 
versible circuits based on the results of 19 



4 The Proposed Synthesis Methodology 

The proposed synthesis methodology is shown in Figure[5] In order to synthesize 
a given input specification, a pre-synthesis optimization is applied on the given 



function to improve it with respect to some metrics (Section 4.1 ). Subsequently. 



a CCF representation is extracted from the prepared input specification (Section 



4.2 1 and then is gradually mapped into a reversible circuit. To this end, if cycle 



length is greater than 5, we apply a cycle decomposition algorithm (Section|4.3| 



to construct elementary cycles. Next, a cycle assignment method (Section 4.4) is 
applied to construct cycle pairs based on the well-known graph matching prob- 
lem. Then, each pair is synthesized by applying the method of [19; and finally a 



post-synthesis optimization is applied to improve the circuit cost (Section 4.5) 



4.1 Pre-Synthesis Optimization 

As discussed in Section [2j an n- input, n-output, fully specified Boolean function 
is reversible if it maps each input pattern to a unique output pattern. Hence, 
a reversible specification must have 'the same number of inputs and outputs' 
with 'unique assignments]^ For example, reconsider the circuit shown in Figure 
[l]-(a) which contains one constant input and two garbage outputs. As illustrated 
in Figure [l]-(b), the initial function specification (without constant and garbage 
lines) does not have the characteristics of a reversible specification. However, 
after the insertion of constant and garbage lines and unique output assignments 
(see Figure [T]-(c)) a reversible specification of size 3 is resulted. 

Since the values of constant and garbage lines and their locations with re- 
spect to other lines are not in the initial specification of an irreversible function, 
such parameters can be manipulated by a synthesis tool to improve the final 
cost. Hence, the values of constant and garbage lines are called don't cares 
{DC). The goal of the pre-synthesis optimization is to assign appropriate values 
to DC lines {DC assignment) and place them at proper locations {constant and 
garbage assignment) to improve the cost. Such optimizations are mandatory for 
irreversible specifications and can be ignored if completely specified functions 
are addressed as done in this paper. 

It is worth noting that some DC assignment algorithms have been proposed 
recently |34|. However, as the efficiency of such assignments depends on the 

•^Recall the truth table notation of reversible circuits. For a reversible circuit with n inputs 
and n outputs (i.e., a circuit of size n), a truth table of size n X 2" is required where the values 
of outputs are uniquely selected from to 2" — 1 probably with a different order. The goal of 
a synthesis algorithm is to put outputs at their right locations (i.e., to 2" — 1 sequentially) 
by applying a set of reversible gates. 
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Figure 5: Proposed Synthesis Methodology 



characteristics of a synthesis algorithm, it is not possible to use a well-developed 
pre-synthesis optimization algorithm for all synthesis approacfj^ 



4.2 Intermediate Format 

Different synthesis algorithms used different representations for their input spec- 
ifications. Among the available models, truth table |8 |14|24) and PPRM expan- 



sions 



16 37| have been widely used. The selected model works as an interme- 
diate format (IF) for the respective synthesis algorithm and is placed between 
two levels of abstraction (i.e., input specification and gate- level circuit). In this 



*Some authors reordered the locations of output hnes of a given fully specified specification 
to improve its final cost [36] . While reordering circuit lines changes the original function 
specification, it may be acceptable for some applications. This approach can also be considered 
as a pre-synthesis optimization method. 
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Table 1: Average distribution ol cycle lengths for available benchmark functions 
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work, CCF representation has been selected as an IF as discussed belowj^j 

Compared with the truth table model, CCF removes fixed rowfj^of a given 
truth table and hence are very efficient for large functions particularly those 
that map many input combinations into themselves. Recall that a synthesis 
algorithm returns the changed rows to their right positions. To this end, a set 
of reversible gates is applied where fewer gates lead to fewer cost, generally. 
Fixed rows can be removed to save memory if the synthesis algorithm does not 
use them directly. In [s], the authors reported that the applicability of their 
synthesis algorithm was limited due to the memory constraint occurred during 
the representation of large input specifications in the truth table format. 

Moreover, while some truth table-based approaches like the one introduced 
in [S] considered both input-to-output and output-to-input transformations at 
the same time (namely bidirectional method), the mentioned transformations 
have equal CCF representations. Therefore, there is no need to consider both 
transformations at the synthesis step concurrently. Hence, lower complexities 
should be handled by the synthesis method. 

On the other hand, for a given n-input,n-output reversible function, n PPRM 
expansions can be extracted which remove explicit values of truth table rows. 
Of course, the truth table rows can be recovered from the PPRM expansions 
with further processing cost. While PPRM notation received attractions in 
some synthesis algorithms, it cannot be used in the proposed method since 
explicit row values are needed in this paper. Altogether, CCF benefits from 
compact notation of PPRM expansions as well as explicit values of truth table 
representation. Therefore, CCF is used as the selected IF in the proposed 
synthesis methodology. 

Having an input specification in the CCF format, the next step is to syn- 
thesize it according to |ll9 where small cycles are synthesized by the suggested 
building blocks directly. Table [I| shows the average distribution of cycle lengths 
for the benchmark functions (34|. As shown in this table, more than 60% of 
cycle lengths are greater than 5. Therefore, many cycles should be decomposed 
into the proposed set of cycles and hence, cycle decomposition can affect the 
synthesis costs considerably. In the following, the effects of cycle decomposition 
on the synthesis results are examined. 

^CCF has been used to describe the input specification in [ll|l9| to some extent. However, 
using CCF as an intermediate format is introduced here for the first time, 
truth-table row is called fixed if it is mapped into itself. 
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4.3 Cycle Decomposition 

Since each decomposed cycle should be synthesized by a set of reversible gates, 



reducing the number of decomposed cycles is preferred in 19 to reduce final 
cost. Moreover, the cycles produced by decomposing disjoint cycles are disjoint 
too. Hence, they can commute to find the best possible selection of cycle pairs 
for having lower synthesis cost. Altogether, each large cycle should be decom- 
posed so that the minimal number of inequivalent 5-cycles are generated and 
the number of disjoint 5-cycles are maximized. For a given cycle tt of length n, 
N^(n) is used as the minimum number of inequivalent decomposed 5-cycles. 

More precisely, to decompose a given large cycle of length n into a set of 
5-cycles, we impose the following conditions: 

• All decomposed cycles should be of length 5 except at most one cycle 
which is of length less than 5. 

• Cycle factorization should be minimal. 

• Inequivalent cycle factorization is considered. 

• Maximum number of disjoint cycles should be produced. 

The first three conditions can be addressed by using the results of Theorem 
[2] for ij = 1 where j is equal to 2, 3 or 4 and = N-^ln). To address the last 
condition some modifications are required. 

Lemma 1 Consider a cycle tt of length n. The maximum number of disjoint 
cycles resulted from an inequivalent 5- cycle factorization is \ n/b\. 

Proof Since there are n distinct elements in tt and each 5-cycle has five distinct 
elements, at most [n/5j disjoint 5-cycles can be resulted. □ 

For a cycle tt of length n, assume that all disjoint 5-cycles are detached. 
According to the minimal factorization together with Equation ([3]), we have 
4 X [n/5j -I- (L — 1) = n — 1 where L is the length of the resulted non-disjoint 
cycle after detaching all disjoint 5-cycles (denoted as vr in the following). It can 
be verified that L is equal to rt — 4 x [n/5j . Note that tt includes at most four 
elements of tt which does not belong to the detached 5-cycles. In addition, it has 
[n/5j elements of tt each of which belongs to exactly one disjoint cycle inserted 
to recover the original cycle tt from the set of disjoint 5-cycles. Considering the 
minimal length of tt, there is exactly one element in tt for each disjoint 5-cycle. 

Lemma 2 Consider a cycle tt of length n. The minimum number of decomposed 
5-cycles, N^(n), resulted from an inequivalent 5-cycle factorization is 




otherwise 
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Proof Based on the definition of inequivalent 5-cycle factorization and Equa- 
tion ([3]), we have (a) = {j - 1) x + 4 x N5{n) if j=2, 3 or 4 and 

f 1 n=0,2, 3 
[ n=l 

Considering the definition of minimal factorization (a) = n — 1, and by doing 
some arithmetic manipulations the lemma is proved. □ 

In order to have both the minimum number of decomposed cycles and the 
maximum number of disjoint cycles for a given cycle tt, the order of elements in 
each disjoint cycle should be the same as the original cycle tt; otherwise some 
extra cycles should be inserted to construct the given permutation. Consider 
the following example for more detail: 

Example 4 Consider a cycle n = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, U, 
15, 16, 17, 18, 19) of length n = 19. Accordingly, ^¥5(71) = 4 and [n/5j = 3. 
Therefore, a total of four decomposed 5-cycles exist three of which are disjoint. 
The decomposttiomr = (1,2,3,4,5) (6,7,8,9,10) (11,12,13,14,15) (16,17,18,19,1) 
(6,11,16) meets the conditions. On the other hand, if the order of elements 
in disjoint cycles are modified, several extra cycles should be included. As 
an example, the decomposition tt = (1,2,4,5,6) (3,7,8,9,10) (11,12,13,14,15) 
(16,17,18,19,1) (3,11,16) (4,3,7) is not minimal. 

According to the above discussion, the elements of each 5-cycle should have 
exactly the same ordering of the original large cycle tt. Now, let us examine the 
elements of tt . As explained, it may contain at most four elements which do not 
belong to any disjoint 5-cycle. Consider Ofc € tt where does not belong to 
any detached disjoint 5-cycle. There are three cases regarding the element Uk 
as follows: 

• Three successive elements a^-i, a^, and 0^+1 belong to tt 

• Two successive elements ak~i, Ok or Ok, ak+i belong to tt 

• Only Ok belongs to tt 

It can be verified that the predecessor {ak-i) and the successor (ak+i) of for 
the first case were placed at right locations. On the other hand, for the second 
and the third cases, some extra cycles should be inserted to fix the locations 
of the predecessor or the successor (or both) elements. Therefore, to have the 
minimum number of decomposed 5-cycles, the ordering of those elements which 
do not belong to any disjoint cycle should be the same as the original large 
cycle. 

Theorem 3 Consider a cycle tt = (ai, 02..., a„) of length n > 5 which .should 
be decomposed into minimum number of inequivalent 5-cycles, N^^n), where the 
number of disjoint 5-cycles .should be maximized (i.e., \n/5\). Assume that 
L(°) = n and L^*) = L^'~^^ - 4 x [L('-i)/5J. Then, there are NocMin) = 
ways for such factorizations where L^'^'^^' < 5. 
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ExtractDecompositions(pi) { 

if n>5 {//at least one cycle can be extracted 
for i in to n{ 

FirstDecompositionResults = 

extract all disjoint cycles starting from the ith element of pi 
newCycle = construct a non-dis joint cycle pi' 
SecondDecompositionResults = CycleExtraction(newCycle) 
for j in to length of SecondDecompositionResults-f 
AllPossibleDecomposition[j] =//merge the results 

{FirstDecompositionResults , SecondDecompositionResults [i] }■ 

} 

> 

}-else{ 

return pi 

} 

return AllPossibleDecomposition 



Figure 6: Extracting all possible decompositions 

Proof To have minimum number of inequivalent 5-cycles and maximum \ n/5\ 
disjoint 5-cycles, the ordering of elements in each disjoint 5-cycle should be 
the same as the ordering of tt. Moreover, for those elements which do not 
belong to any disjoint 5-cycles, the same ordering of tt should be used in tt. 
Therefore, the sequence of all elements should be saved. Since we have tt = 
(ai,a2...,a„) = (02, 03..., a„, ai) = ... = (a„,ai, ...a„_i), there are L(°) = nways 
of such decomposition. After detaching all disjoint 5-cycles, a non-disjoint cycle 
7T of length L^^) ~ n — 4 x [n/5j will be resulted which can be decomposed into 
a set of 5-cycles in L^^^ ways. This process can be continued until a non-disjoint 
cycle of length less than 5 is produced. Considering all ways of decompositions 
leads to the theorem. □ 

For a given cycle pi of length n, a recursive procedure can be applied in relation 
to the proof of Theorem |3] to extract all decompositions. Figure |6] illustrates a 
pseudo code. 

Example 5 Consider a cycle (1, 2, 3, 18) of length n = 18. This cycle can be 
decomposed into [18/5J — 3 disjoint cycles in L^^-* = 18 ways. After detaching 
all disjoint cycles, a non-disjoint cycle of length L^^' = 18 — 4 * [l^/^J — 6 is 
produced which can be decomposed into [6/5j — 1 5-cycle in L^^-* — 6 different 
ways. Hence, 18 x 6 different decompositions are generated. The following items 
list four possible decompositions: 

• (1, 2, 3, 4, 5)(6, 7, 8, 9, 10)(11, 12, 13, 14, 15)(16, 17, 18, 1, 6)(11, 16) 

• (1, 2, 3, 4, 5)(6, 7, 8, 9, 10)(11, 12, 13, 14, 15)(17, 18, 1, 6, 11)(16, 17) 
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Figure 7: The results of cycle decomposition step 



• (1, 2, 3, 4, 5)(6, 7, 8, 9, 10)(11, 12, 13, 14, 15)(18, 1, 6, 11, 16)(17, 18) 

• (18, 1,2, 3, 4)(5, 6, 7, 8, 9)(10, 11, 12, 13, 14)(15, 16, 17, 18, 5)(15, 10) 

There are many ways of decomposing a given large cycle into a set of cycles of 
length less than 6 with minimum number of decomposed cycles and maximum 
number of disjoint cycles. In the next subsection, the process of selecting cycle 
pairs is evaluated. 

4.4 Cycle Assignment 

For a given cycle vr of length n, NocKiin) different decompositions are possi- 
ble where each decomposition includes N^in) decomposed 5-cycles with [n/Sj 
disjoint 5-cycles. Non-disjoint cycles cannot be arbitrarily moved. Figure [7] 
illustrates the result of cycle decomposition step. In this figure, an input spec- 
ification with N cycles are shown where the i^^ cycle was decomposed into 
[rij/Sj 5-cycles in Mi different ways (rij > 5 and Mi = NjjcAiini)) denoted 
as DCM #1, • • •, DCM #Afi. Now, one can select one of the available decom- 
positions for each input cycle to construct a set of elementary cycles of size 
[ni/5j -I- [?T.2/5J -I- • ■ ■ -f L'iJv/5j . Next, cycle pairs should be assigned to be used 
by the synthesis algorithm as follows. 

In order to find cycle pairs, we model the cycle assignment step as a graph 
perfect matching problem. For a set with N elementary cycles, N x (iV — 
l)/2 cycle pairs can be determined where each pair can be synthesized with a 
specific quantum cost. Since each cycle pair can be considered as a valid cycle 
assignment, we first synthesize each cycle pair using the method of ^1^. Then, 
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Figure 8: Cycle assignment. Different nodes represent different disjoint cycles. 
A connected edge between two nodes denotes the probability of synthesizing the 
cycles as a pair. Each edge contains a weight which is the synthesis cost of the 
involved cycles. 



a weighted graph is constructed with N nodes and N x (N — l)/2 edges. The 
actual synthesis quantum cost for each cycle pair is used as the weight of the 
edge between the respective nodes. Next, a graph perfect matching algorithm is 
applied to find the best possible matching with the minimum cost. Therefore, 
cycle assignments which produce lowest total cost are found. 

Figure|8]illustratcs the cycle assignment problem for the generated disjoint 5- 
cycles. As can be seen in this figure, there are 8 disjoint 5-cycles which construct 
a complete graph on 8 nodes. A possible cycle assignment is shown by solid 
edges. It is worth noting that since all cycles of a given input specification are 
disjoint, the resulted set of 2-cycles contains only disjoint cycles. Therefore, 
it is possible to apply cycle assignment step to the elementary 2-cycles too. 
Similarly, this process can be repeated for all 3-cycles and 4-cycles. 

Example 6 Consider a given input specification with two cycles tti and 
of lengths 18 and 13, respectively. It can he verified that NucAii^S) = 108 
and iV_DCAf (13) — 13. In addition, the decomposition of tti and n2 leads to 
[I8/5J = 3 and [13/5J = 2 disjoint cycles. Therefore, a set of five disjoint cycles 
will be resulted. Now, a complete weighted graph with 5 nodes and 10 edges is 
constructed where nodes represent cycles and edges represent the probability of 
synthesizing the connected cycles as a pair. Edge weights are the actual synthesis 
costs. After running the perfect matching algorithm, two cycle pairs are selected 
to be synthesized with each other and the remaining cycle is synthesized alone. 

In addition to the effect of cycle assignment on the synthesis cost, the order 
of elements in each cycle affects the synthesis result. More precisely, consider 
two disjoint 5-cycles tt[^'^ = (ai, 02, as, 04, 05) and = (06,07,08,09,010) 

where Oi ^ aj if i 7^ j, 1 < i,j < 10. It can be seen that these cycles can be 

(2) (2) 
written, for example, as tt} = (04,05,01,02,03) and = (010,06,07,03,09) 

too. However, direct synthesis of Tr[^^ and 772^'' may be better or worse that 
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synthesizing tt\ and tTj • To remove the effect of element ordering on the 
synthesis cost, we synthesize each two disjoint cycles in all possible ways (e.g., 
for two disjoint 5-cycles, 25 different ways are explored). Next, the best possible 
synthesis cost is assigned as the weight of the related edge. 

Assume that a specification with k cycles of length ni, n2, Uk is given. 
A cycle of length rii {1 < i < k) can be decomposed in NucMini) differ- 
ent ways each of which includes [n^/Sj disjoint 5-cycles. Therefore, by se- 
lecting one of the available decompositions for each input cycle, X]i=i L"i/5J 
disjoint 5-cycles will be resulted (Fig. [7| which lead to a complete graph with 
L"i/5j X (X]i=i ["^j/Sj — l)/2 edges (Fig. |8|. Hence, the total time com- 
plexity required to select an appropriate cycle assignment for such decomposi- 
tion is 25 X L'^i/5J X (Si=i L"-i/5j — l)/2 X 0{synthesis) ~\- O {matching). 
Consideration of all possible cycle decompositions leads to Oi^i -^£'CM(f^^) x 
(25 X L'^i/5J X (X)i=i L'^i/5J — l)/2 X O (synthesis) + O {matching)). As 
can be seen, the time complexity of evaluating all possible cycle decompositions 
is very large. In the experimental results section, the runtime for each bench- 
mark was limited to a reasonable time. Since no cycle decomposition is required 
for other elementary cycles, much less time will be required to select cycle pairs 
among the available 2-, 3- and 4-cycles. 



4.5 Post-Synthesis Optimization 

Finding the optimal realization for a given reversible specification needs the eval- 
uation of an exponential search spacej^ Therefore, it is very time-consuming to 
obtain an optimal realization for a given middle size reversible specificatior{^ As 
a result, the usefulness of exact synthesis methods limits to relatively small spec- 
ification. In addition, there are various metrics besides gate count or quantum 
cost 38 that can be considered in the synthesis stage to improve the synthe- 
sized results. Altogether, due to various complexities involved in the synthesis of 
reversible circuits, there is a need to improve the quality of synthesized circuits 
in a post-processing step. 

Previously, a few post-synthesis optimization methods have been introduced 
which used some pre-defined gate patterns (called templates) |24) or a well- 



developed data structure 35 for the optimization of synthesized circuits. In 



this paper, we use the method of 35 as a post-synthesis optimization algorithm 



'^Consider a quantum circuit of size n. Suppose that the optimal realization of a reversible 
specification needs h gates from a library of size M. It can be verified that an exhaustive 
method needs the evaluation of gates where M = 0{n X 2") as follows: 
There are possible NOT gates and possible CNOT gates in which one of its two inputs 
can be the target output. Hence, the total number of 2xC^ CNOT gates can be obtained. 
In contrast, for a (fc-l-l)-bit gate, k £ (2, 3, ■ ■ ■ , n — 1), there are C^_^ possible gates when 
the target can be the j"* {i S [!,"]) bit. Considering all possible bits as the target leads 
to the total number of n X C^_-^ (fc+l)-bit gates. Therefore, the total number of gates is 

Cl + 2xCl+nx (E,6(2...n-i) = " X 2"-i. 

*The evaluation of synthesized circuits should be done with respect to a specific metric. 
Quantum cost or gate count can be used for this purpose. 
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as discussed in Section [5] 



5 Experimental Results 

The proposed library-based synthesis methodology was implemented in C++ 
and all of the experiments were done on an Intel Pentium IV 2.2GHz computer 
with 2GB memory. In order to find a perfect matching on a given graph, we 



used Blossom V implementation 39 . In addition, we used two recent synthesis 
tools proposed in [S] and 35 for our comparisons. To the best of our knowledge. 



these are the most recent relevant works on reversible synthesis algorithms. In 



particular, 35 is similar to our synthesis algorithm with respect to using NOT 



gates and cycles. The application of exact methods like 13 34 for finding 
optimal circuits are limited to small functions. 

In all experiments, the post-synthesis optimization algorithm proposed in 



35 was applied to simplify circuits produced by our synthesis methodology. In 
addition, the synthesis algorithm of [s] was applied in ^synthesized/ resynthesized 
using 3 methods^ mode for circuits with n < 15 (n is the circuit size) and in 
'synth/resynth with MMD (15+ variables/ for n > 15. For [sj, the synthesis 
algorithm, the templates matching method, the random and exhaustive driver 
algorithms were applied sequentially to synthesize each function with a time 
limit of 12 hours as in (8^. Bidirectional and quantum cost reduction modes 
were also applied. 

To evaluate the proposed synthesis methodology, the completely specified 
reversible benchmark functions (no DC) with more than six variables 34 were 
examined as library elements were designed for more than six variables in 19 . 
Note that for small circuits, several well-developed exact and heuristic methods 
have been proposed (8 p2p3p!6p4l . We first fixed zero and 2* terms by applying 
a few Toffoli and CNOT gates in a pre-synthesis optimization step. Then, other 
parts of the proposed methodology were applied. To compare the results, we 
evaluated all synthesis algorithms in terms of quantum cost and the number of 



garbage bits. Quantum costs were calculated based on 24 . 

The results of our synthesis algorithm and the previous best-proposed cir- 
cuits that used the same gate library are reported in Table [s] Headings 'w/ 
g' and 'w/o g' stand for 'with garbage' and 'without garbage', respectively. In 
addition, '# g' denotes the number of garbage line. The symbol '-' is used if 
the algorithm fails to synthesize the circuit in 12 hours. 

The synthesis tool of [sj failed to synthesize the functions urf4 and urf6 after 
12 hours. For urfl, urf2, urf3, and urf5 functions, several circuits were reported 



in 40 . The resulted costs for these circuits are 45855, 16152, 121716, and 
24253, respectively. Since applying the method of [s] significantly improves the 
previous costs, we reported the new ones in Table |3j 

Since the number of valid decompositions for each cycle grows rapidly with 
the size of functions, for each benchmark function, we limited the runtime to 30 
minutes and evaluated a limited set of decompositions for each cycle to find the 
best possible cost. Table [2] shows the CPU time and the peak memory usage 
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Table 2: The CPU time and peak memory usage of the proposed synthesis 



methodology 



Benchmark 


n 


Modified rows 




CPU Time 




Peak memory 


Function 




(%) 


Decomposition 


CA 


Optimization 


(Mbytes) 








(Milliseconds) 


(Minutes) 


(Minutes) 




hani7 


7 


62 


3 


29 


1 


86 


hwb7 


7 


92 


5 


29 


1 


87.5 


hwbS 


8 


96 


5 


28 


2 


90 


hwb9 


9 


97 


14 


25 


5 


99.5 


hwblO 


10 


97 


18 


26 


4 


250 


hwbll 


11 


99 


103 


24 


6 


263 


cyclcl0_2 


12 


43 


100600 


23 


5 


980 


uifl 


9 


92 


217011 


21 


5 


453 


urf2 


8 


89 


43867 


26 


3 


290 


urf3 


10 


93 


256352 


20 


5 


1220 


urf4 


11 


98 


2568 


29 


1 


890 


urf5 


9 


85 


711 


29 


1 


90 


urf6 


15 


~ 


945 


29 


1 


530 



of the proposed synthesis methodology for each function. As illustrated in this 
table, the required CPU time for the decomposition step is less than five minutes 
for each circuit. In addition, the post-synthesis optimization step needs less than 
5 minutes on average. The cycle-assignment step which includes the evolution 
of all possible cycle pairs for finding the best synthesis cost is the only time- 
consuming step. The required run time for other steps of Fig. [5] is negligible. 
As discussed, the best available synthesis algorithm needs about 12 hours to 
synthesize the available benchmarks (e.g., hwbll). Hence, the potential of the 
proposed synthesis methodology in synthesizing large function is considerable. 

As demonstrated in Table [2] the proposed synthesis methodology needs up to 
1.3 GBytes of memory to synthesize each benchmark function. In this table, the 
percentage of modified rows in truth-table representation was also reported. As 
discussed in Section [4.2[ while all rows should be kept in memory for truth table 
representation, only modified rows need to be represented in CCF. As shown 
m Table while for some functions (e.g., hwbll), the CCF representation is 
not very efficient compared with the truth-table representation, for some others 
(e.g., urf6) the CCF representation is very efficient. Altogether, CCF needs to 
represent about 20% less rows on average. 

Table [3] shows the synthesis results. In this table, the synthesis cost of 
applying the method of 19 for only one decomposition and with a trivial cycle 
assignment, where consecutive cycles are assigned to each other, are shown 
(1-Way DCM-hCA). As shown in Table [sj our synthesis costs for almost all 
functions are better than the costs of other methods. Since all of the attempted 
functions are even permutations, they can be implemented by the NCT-library 
with no additional garbage line ^11^ . As the synthesis algorithm of [S] uses one 
additional garbage line for the circuits of Table [s] (except ham? and cyclel0_2) 
the synthesis costs with and without garbage lines are reported. 
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Table 3: The comparison costs of our library-based synthesis methodology with 



the algorithms of [8|, 19 and [35]. Improved results are in bold both for w/ and 
w/o garbage. For [8 , a time limit of 12 hours was applied as done in |8]. The 



method of [35] and 19 required a few minutes for each function. At most 30 



minutes were required for each circuit in the proposed methodology as shown 



Benchmark 




8 








19 




Our method 


Functions 


#g 


w/ g 


w/o g 


(w/o g) 


(w 


/o 


g) 


(w/o g) 


ham? 





49 


49 


2695 


2117 


1804 


hwb7 


1 


2609 


2613 


4450 


3177 


2727 


hwbS 


1 


6197 


7015 


10727 


7163 


6535 


hwb9 


1 


20378 


22510 


28135 


16283 


15462 


hwblO 


1 


46597 


59197 


64442 


36182 


34224 


hwbll 


1 


122144 


136760 


179966 


91973 


86942 


cyclel0_2 





1206 


1206 


197041 


93086 


89192 


urfl 


1 


21850 


23983 


31155 


17281 


16619 


urf2 


1 


8161 


9418 


12823 


7291 


6600 


urf3 


1 


49843 


61046 


76114 


38133 


36927 


urf4 




190058 


93992 


90696 


urf5 


1 


12782 


14225 


24086 


14876 


13930 


urf6 




34431 


17367 


16687 



6 Conclusion 

In this paper, a synthesis methodology for reversible circuits was proposed which 
used a set of building blocks and a library to synthesize a given specification. 
To this end, each input specification is considered as a permutation with several 
cycles where each cycle is synthesized by some reversible gates. If a given cycle 
is found in the library, it is synthesized directly; otherwise, the proposed de- 
composition algorithm detaches the building blocks from the given cycle. The 
decomposition algorithm explores all possible minimal and inequivalent factor- 
izations where the number of disjoint cycles is maximized. To synthesize a given 
permutation, cycle pairs should be selected to reduce synthesis cost. Therefore, 
a cycle assignment algorithm was proposed based on the graph perfect matching 
algorithm too. Experimental results on reversible functions shows the advan- 
tage of the proposed approach in reducing both synthesis cost (i.e. quantum 
cost and number of garbage lines) and runtime. 
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